DOPP Project - Terrorism and Wealth

Hypotheses

  • Does unemployment correlate with the terrorism within a country?
  • Does GDP per capita correlate with terrorism within a country?
  • Which economical and social factors influence terrorism?
  • What data is necessary to predict terrorism in the next year?

Economical and Social Influences

  • Unemployment
  • GDP
  • GDP per capita
  • GDP per capita change%
  • Life Expectancy

Definition of Terrorism

We define terrorism as the number of casualities by terrorist attacks in a country within a year.

Definition of Correlation

  • A correlation is too weak or non existent if the correlation value is lower than absolute 0.3
  • A correlation is weak if the correlation value is between absolute 0.3 and 0.7
  • A correlation is high if the correlation value is higher than absolute 0.7

Datasets

Please note, that the names of the datasets may have been changed before reading in. Terrorism dataset is of format .csv while worldbank data is excel .xls. This is due to easier reading in. For the excel file the first 3 columns have to be deleted manually in MS Excel. The excel files that were used for the project are provided on https://github.com/msieder/terror aswell.

Packages

In [1]:
import pandas as pd
import numpy as np
#pip install pycountry
import pycountry
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.linear_model import Ridge, Lasso, LinearRegression, RidgeCV
from sklearn.ensemble import RandomForestRegressor
import math
#pip install plotly
import plotly as py 
import plotly.graph_objs as go 
import plotly.express as px
# some more libraries to plot graph 
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot, plot 
  

Reading in Data

We are going to:

  • Read in 6 different datasets
  • Fix column names for terrorism dataset
  • Select a subset of terrorism dataset
  • Compute Casualities (Killed+Wounded)
  • remove 2018 and 2019 data in world bank (many NAs and no corresponding terrorism data)
  • Focus on data starting between 1992 and 2017
  • Get general statistics about the datasets

Understanding the data

In order to understand the data we should first understand the connecting terms to the specific field which the data is about. Hereby, we analyze the characteristics of the dataset (size, attribute types value ranges, sparsity, min/max values, outliers, missing values ans so on, to have a conmprehensive picture about the data we are working with.

In [2]:
# data file paths
glob_terr_data_path = "./Data/globalterrorismdb_0718dist.csv"
gdp_total_data_path = "./Data/gdp_total.xls"
gdp_capita_data_path = "./Data/gdp_capita.xls"
unemployment_data_path = "./Data/unemployment.xls"
life_data_path = "./Data/life.xls"
capita_growth_data_path = "./Data/capita_growth.xls"

Dataset 1: Data from Global Terrorism DB

Terrorism dataset from https://www.kaggle.com/START-UMD/gtd

To have a better understanding let's describe the attributes:

  • Year, Month and Day: date of incident
  • Country, Region: name of geographical region and country
  • Latitude, Longitude: exact location of incident
  • AttackType: can be bombing, armed assault, assasination, etc.
  • Killed: number of killed people
  • Wounded: number of wounded people
  • Target: exact object, person or group of people which was the aim of the attack
  • Group: terrorist group which officially claimed to be the operator behind the action
  • Target_type: target which can be infrastructural entity, utilities, transportation, citizens, officials or other people
  • Weapon_type: type of equipment which was used to commit the attacks, such as firearms or explosives
  • Motive: motivation behinf the attack
In [3]:
terror = pd.read_csv(glob_terr_data_path,encoding="ISO-8859-1")
terror.rename(columns={'iyear':'Year','imonth':'Month','iday':'Day','country_txt':'Country',
                       'region_txt':'Region','city':'City','latitude':'Latitude',
                       'longitude':'Longitude','attacktype1_txt':'AttackType',
                       'target1':'Target','nkill':'Killed','nwound':'Wounded',
                       'summary':'Summary','gname':'Group','targtype1_txt':'Target_type',
                       'weaptype1_txt':'Weapon_type','motive':'Motive'},
              inplace=True)
terror=terror[['Year','Month','Day','Country','Region','City','Latitude','Longitude','AttackType',
               'Killed','Wounded','Target','Summary','Group','Target_type','Weapon_type','Motive']]
terror['Casualities']=terror['Killed']+terror['Wounded']
terror = terror[terror.Year>=1991]

terror_head  = terror.head(5)
terror_stats = terror.loc[:,['Year',
                'Latitude',
                'Longitude',
                'Killed',
                'Wounded',
                'Casualities']]

print("Head of terror data: \n",terror_head)
print("Basic statistics about terror data: \n", terror_stats.describe())
print("Number of Null values by column \n", terror.isnull().sum(axis=0))


terror_dropped = terror.dropna(how='all')
terror_dropped.hist(figsize=(35,30),
                  bins=20)
plt.tight_layout()
plt.show()

terror_cols = terror.columns
num_terror_cols = terror._get_numeric_data().columns
non_num_terror_cols = list(set(terror_cols) - set(num_terror_cols))
print("Categorical attributes \n",non_num_terror_cols)
print()
for col in ['Region', 'Country', 'Group', 'City', 'Target_type', 'Weapon_type', 'AttackType']:
    print(terror[col].value_counts())
    print()
    
D:\Programs\Anaconda3\lib\site-packages\IPython\core\interactiveshell.py:3058: DtypeWarning:

Columns (4,6,31,33,61,62,63,76,79,90,92,94,96,114,115,121) have mixed types. Specify dtype option on import or set low_memory=False.

Head of terror data: 
        Year  Month  Day      Country                       Region  \
41338  1991      1   30       Jordan   Middle East & North Africa   
44962  1991      1    0  Philippines               Southeast Asia   
44963  1991      1    0    Guatemala  Central America & Caribbean   
44964  1991      1    0  Philippines               Southeast Asia   
44965  1991      1    1     Colombia                South America   

                        City   Latitude   Longitude  \
41338                  Amman  31.950001   35.933331   
44962                  Lopez        NaN         NaN   
44963               El Subin  16.633333  -90.183333   
44964                 Conner  17.795745  121.322798   
44965  Buenaventura district   3.881820  -77.070420   

                           AttackType  Killed  Wounded  \
41338  Facility/Infrastructure Attack     0.0      0.0   
44962               Bombing/Explosion     NaN      NaN   
44963                   Assassination     0.0      0.0   
44964                   Armed Assault     8.0      0.0   
44965               Bombing/Explosion     0.0      0.0   

                                           Target Summary  \
41338                      French Cultural Center     NaN   
44962                      30 meter cement bridge     NaN   
44963  Helicopter, President Jorge Serrano Elias*     NaN   
44964                                  detachment     NaN   
44965                        Pacific Oil Pipeline     NaN   

                                                   Group  \
41338                                            Unknown   
44962                            New People's Army (NPA)   
44963     Guatemalan National Revolutionary Unity (URNG)   
44964                            New People's Army (NPA)   
44965  Simon Bolivar Guerrilla Coordinating Board (CGSB)   

                       Target_type Weapon_type Motive  Casualities  
41338      Government (Diplomatic)  Incendiary    NaN          0.0  
44962               Transportation  Explosives    NaN          NaN  
44963  Private Citizens & Property    Firearms    NaN          0.0  
44964                     Military    Firearms    NaN          8.0  
44965                    Utilities  Explosives    NaN          0.0  
Basic statistics about terror data: 
                Year       Latitude      Longitude         Killed  \
count  136730.00000  134608.000000  134607.000000  132106.000000   
mean     2008.96673      25.592710      44.178868       2.493445   
std         7.97551      15.277044      44.792899      12.278593   
min      1991.00000     -43.532054    -176.176447       0.000000   
25%      2005.00000      13.729313      31.650077       0.000000   
50%      2013.00000      32.069286      44.371773       1.000000   
75%      2015.00000      34.467964      70.384510       2.000000   
max      2017.00000      74.633553     179.366667    1570.000000   

             Wounded    Casualities  
count  127936.000000  127537.000000  
mean        3.601809       5.761167  
std        40.483276      47.147954  
min         0.000000       0.000000  
25%         0.000000       0.000000  
50%         0.000000       1.000000  
75%         2.000000       5.000000  
max      8191.000000    9574.000000  
Number of Null values by column 
 Year               0
Month              0
Day                0
Country            0
Region             0
City             434
Latitude        2122
Longitude       2123
AttackType         0
Killed          4624
Wounded         8794
Target           257
Summary        22377
Group              0
Target_type        0
Weapon_type        0
Motive         87100
Casualities     9193
dtype: int64
Categorical attributes 
 ['Motive', 'Target_type', 'Summary', 'Weapon_type', 'City', 'AttackType', 'Group', 'Target', 'Region', 'Country']

Middle East & North Africa     46007
South Asia                     41451
Sub-Saharan Africa             15282
Southeast Asia                 10857
South America                   7195
Western Europe                  6788
Eastern Europe                  5017
Central America & Caribbean     1620
North America                   1265
Central Asia                     563
East Asia                        515
Australasia & Oceania            170
Name: Region, dtype: int64

Iraq             24600
Pakistan         14068
Afghanistan      12703
India            10359
Philippines       5546
                 ...  
Brunei               1
North Korea          1
Dominica             1
Botswana             1
International        1
Name: Country, Length: 191, dtype: int64

Unknown                                                  69298
Taliban                                                   7478
Islamic State of Iraq and the Levant (ISIL)               5613
Al-Shabaab                                                3288
Boko Haram                                                2418
                                                         ...  
Palestinian Activists                                        1
Dignity Command                                              1
Chinese Pirates                                              1
Pattani United Liberation Organization-MKP (PULO-MKP)        1
Ejercito Revolucionaria del Pueblo (ERP) (Argentina)         1
Name: Group, Length: 2316, dtype: int64

Baghdad      7570
Unknown      7079
Karachi      2572
Mosul        2262
Mogadishu    1559
             ... 
Diduki          1
Raumata         1
Chenar          1
Nema            1
Dubrajpur       1
Name: City, Length: 30899, dtype: int64

Private Citizens & Property       36052
Military                          20997
Police                            19629
Government (General)              15513
Business                          12842
Unknown                            5586
Transportation                     4582
Religious Figures/Institutions     3766
Educational Institution            3375
Utilities                          3169
Terrorists/Non-State Militia       2616
Government (Diplomatic)            1997
Journalists & Media                1911
Violent Political Party            1544
NGO                                 839
Telecommunication                   675
Airports & Aircraft                 611
Tourists                            334
Maritime                            229
Food or Water Supply                188
Abortion Related                    151
Other                               124
Name: Target_type, dtype: int64

Explosives                                                                     72179
Firearms                                                                       41736
Unknown                                                                        11155
Incendiary                                                                      7741
Melee                                                                           3249
Chemical                                                                         265
Vehicle (not to include vehicle-borne explosives, i.e., car or truck bombs)      125
Sabotage Equipment                                                               112
Other                                                                            102
Fake Weapons                                                                      28
Biological                                                                        27
Radiological                                                                      11
Name: Weapon_type, dtype: int64

Bombing/Explosion                      68067
Armed Assault                          32666
Assassination                          11271
Hostage Taking (Kidnapping)             9336
Facility/Infrastructure Attack          7508
Unknown                                 5993
Unarmed Assault                          939
Hijacking                                478
Hostage Taking (Barricade Incident)      472
Name: AttackType, dtype: int64

terror data insights

Dataset 2: GDP

GDP dataset from https://data.worldbank.org/indicator/NY.GDP.MKTP.CD

To have a better understanding let's describe the attributes:

  • Country: Name of country
  • CountryCode: 3 letter code of country
  • 1990 - 2019: yearly GDP of given country
In [4]:
gdp = pd.read_excel(gdp_total_data_path)
gdp.rename(columns={"Country Name":"CountryName","Country Code":"CountryCode"},inplace=True)
gdp.drop(gdp.columns[[range(2,34)]], axis=1,inplace=True)
gdp.drop("2018", axis=1, inplace=True)
gdp.drop("2019", axis=1, inplace=True)
gdp.head(5)

gdp_head  = gdp.head(5)
gdp_stats = gdp.loc[:,[str(e) for e in list(range(1990,2018))]]

print("Head of GDP data: \n",gdp_head)
print("Basic statistics about GDP data: \n", gdp_stats.describe())
print("Number of Null values by column \n", gdp.isnull().sum(axis=0))

gdp_dropped = gdp.dropna(how='all')
gdp_dropped['avg'] = gdp_dropped.mean(axis=1)
print(gdp_dropped.loc[:,['CountryName','avg']].nlargest(10,'avg'))
Head of GDP data: 
    CountryName CountryCode          1990          1991          1992  \
0        Aruba         ABW  7.648871e+08  8.721387e+08  9.584632e+08   
1  Afghanistan         AFG           NaN           NaN           NaN   
2       Angola         AGO  1.122876e+10  1.060378e+10  8.307811e+09   
3      Albania         ALB  2.028554e+09  1.099559e+09  6.521750e+08   
4      Andorra         AND  1.029048e+09  1.106929e+09  1.210014e+09   

           1993          1994          1995          1996          1997  ...  \
0  1.082980e+09  1.245688e+09  1.320475e+09  1.379961e+09  1.531944e+09  ...   
1           NaN           NaN           NaN           NaN           NaN  ...   
2  5.768720e+09  4.438321e+09  5.538749e+09  7.526447e+09  7.648377e+09  ...   
3  1.185315e+09  1.880952e+09  2.392765e+09  3.199643e+09  2.258516e+09  ...   
4  1.007026e+09  1.017549e+09  1.178739e+09  1.223945e+09  1.180597e+09  ...   

           2008          2009          2010          2011          2012  \
0  2.745251e+09  2.498883e+09  2.390503e+09  2.549721e+09  2.534637e+09   
1  1.010922e+10  1.243909e+10  1.585657e+10  1.780428e+10  2.000162e+10   
2  8.853861e+10  7.030716e+10  8.379950e+10  1.117897e+11  1.280529e+11   
3  1.288135e+10  1.204422e+10  1.192696e+10  1.289087e+10  1.231978e+10   
4  4.007353e+09  3.660531e+09  3.355695e+09  3.442063e+09  3.164615e+09   

           2013          2014          2015          2016          2017  
0  2.581564e+09  2.649721e+09  2.691620e+09  2.646927e+09  2.700559e+09  
1  2.056105e+10  2.048487e+10  1.990711e+10  1.936264e+10  2.019176e+10  
2  1.367099e+11  1.457122e+11  1.161936e+11  1.011239e+11  1.221238e+11  
3  1.277628e+10  1.322824e+10  1.138693e+10  1.186135e+10  1.302506e+10  
4  3.281585e+09  3.350736e+09  2.811489e+09  2.877312e+09  3.013387e+09  

[5 rows x 30 columns]
Basic statistics about GDP data: 
                1990          1991          1992          1993          1994  \
count  2.190000e+02  2.180000e+02  2.190000e+02  2.240000e+02  2.260000e+02   
mean   7.673053e+11  8.139883e+11  8.551928e+11  8.496997e+11  9.030065e+11   
std    2.875431e+12  3.043284e+12  3.246939e+12  3.240003e+12  3.456284e+12   
min    8.824448e+06  9.365166e+06  9.742949e+06  9.630763e+06  1.088683e+07   
25%    2.641438e+09  2.671369e+09  2.540916e+09  2.564262e+09  2.525683e+09   
50%    1.229057e+10  1.136035e+10  1.139631e+10  1.313998e+10  1.290901e+10   
75%    1.641505e+11  1.710247e+11  1.672990e+11  1.817524e+11  1.679646e+11   
max    2.265557e+13  2.398109e+13  2.546445e+13  2.586975e+13  2.777514e+13   

               1995          1996          1997          1998          1999  \
count  2.340000e+02  2.340000e+02  2.340000e+02  2.360000e+02  2.370000e+02   
mean   9.750219e+11  1.004262e+12  1.003393e+12  9.932857e+11  1.015182e+12   
std    3.774895e+12  3.831979e+12  3.789375e+12  3.791364e+12  3.939836e+12   
min    1.102595e+07  1.233485e+07  1.270091e+07  1.275763e+07  1.368714e+07   
25%    2.733147e+09  3.105664e+09  3.303352e+09  3.128167e+09  3.212119e+09   
50%    1.390121e+10  1.435287e+10  1.542189e+10  1.521143e+10  1.571015e+10   
75%    1.822420e+11  1.907199e+11  1.990760e+11  1.915071e+11  1.878302e+11   
max    3.087130e+13  3.155461e+13  3.143965e+13  3.137824e+13  3.254267e+13   

       ...          2008          2009          2010          2011  \
count  ...  2.510000e+02  2.510000e+02  2.510000e+02  2.520000e+02   
mean   ...  2.026050e+12  1.921724e+12  2.131839e+12  2.385001e+12   
std    ...  7.147969e+12  6.755116e+12  7.313702e+12  8.079482e+12   
min    ...  3.029077e+07  2.710131e+07  3.182470e+07  3.871328e+07   
25%    ...  6.082389e+09  5.819647e+09  6.906562e+09  7.602980e+09   
50%    ...  3.589515e+10  3.702151e+10  4.133801e+10  4.461484e+10   
75%    ...  4.862393e+11  4.304036e+11  4.801365e+11  5.309892e+11   
max    ...  6.361162e+13  6.033414e+13  6.605122e+13  7.339319e+13   

               2012          2013          2014          2015          2016  \
count  2.510000e+02  2.530000e+02  2.520000e+02  2.510000e+02  2.500000e+02   
mean   2.460850e+12  2.530572e+12  2.614439e+12  2.469935e+12  2.508480e+12   
std    8.255169e+12  8.469086e+12  8.717078e+12  8.273257e+12  8.413866e+12   
min    3.767173e+07  3.750751e+07  3.729184e+07  3.549233e+07  3.654742e+07   
25%    7.997062e+09  8.401993e+09  9.029027e+09  8.286053e+09  8.172516e+09   
50%    4.658046e+10  5.155208e+10  5.447456e+10  4.997389e+10  5.141799e+10   
75%    5.484547e+11  5.242343e+11  5.511666e+11  5.118846e+11  5.139712e+11   
max    7.508513e+13  7.723632e+13  7.933269e+13  7.504947e+13  7.616390e+13   

               2017  
count  2.470000e+02  
mean   2.715690e+12  
std    8.987376e+12  
min    4.062056e+07  
25%    9.469677e+09  
50%    5.648899e+10  
75%    5.927060e+11  
max    8.095067e+13  

[8 rows x 28 columns]
Number of Null values by column 
 CountryName     0
CountryCode     0
1990           45
1991           46
1992           45
1993           40
1994           38
1995           30
1996           30
1997           30
1998           28
1999           27
2000           20
2001           20
2002           15
2003           15
2004           14
2005           13
2006           13
2007           12
2008           13
2009           13
2010           13
2011           12
2012           13
2013           11
2014           12
2015           13
2016           14
2017           17
dtype: int64
                   CountryName           avg
257                      World  4.828540e+13
93                 High income  3.529773e+13
179               OECD members  3.480384e+13
196  Post-demographic dividend  3.328144e+13
63       Europe & Central Asia  1.559325e+13
101           IDA & IBRD total  1.346986e+13
168              North America  1.321464e+13
71              European Union  1.306146e+13
138        Low & middle income  1.298469e+13
154              Middle income  1.273848e+13
D:\Programs\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:4291: FutureWarning:

Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.

gdp results

Dataset 3: GDP per Capita

GDP per capita from https://data.worldbank.org/indicator/NY.GDP.PCAP.CD

The same rules apply and the same terms are used here as at the GDP dataset, only difference is that now we have values per Capita per Country.

In [5]:
capita = pd.read_excel("./Data/gdp_capita.xls")
capita.rename(columns={"Country Name":"CountryName","Country Code":"CountryCode"},inplace=True)
capita.drop(capita.columns[[range(2,34)]], axis=1,inplace=True)
capita.drop("2018", axis=1, inplace=True)
capita.drop("2019", axis=1, inplace=True)

capita_head  = capita.head(5)
capita_stats = capita.loc[:,[str(e) for e in list(range(1990,2018))]]

print("Head of GDP/capita data: \n",capita_head)
print("Basic statistics about GDP/capita data: \n", capita_stats.describe())
print("Number of Null values by column \n", capita.isnull().sum(axis=0))

capita_dropped = capita.dropna(how='all')
capita_dropped.hist(figsize=(35,30),
                  bins=20)
plt.tight_layout()
plt.show()

capita_dropped['avg'] = capita_dropped.mean(axis=1)
print(capita_dropped.loc[:,['CountryName','avg']].nlargest(10,'avg'))
Head of GDP/capita data: 
    CountryName CountryCode          1990          1991          1992  \
0        Aruba         ABW  12307.311738  13496.003385  14046.503997   
1  Afghanistan         AFG           NaN           NaN           NaN   
2       Angola         AGO    947.704182    865.692730    656.361756   
3      Albania         ALB    617.230436    336.586995    200.852220   
4      Andorra         AND  18878.505969  19532.540150  20547.711790   

           1993          1994          1995          1996          1997  ...  \
0  14936.827039  16241.046325  16439.356361  16586.068436  17927.749635  ...   
1           NaN           NaN           NaN           NaN           NaN  ...   
2    441.200673    328.673295    397.179451    522.643807    514.295223  ...   
3    367.279225    586.416340    750.604449   1009.977668    717.380567  ...   
4  16516.471027  16234.809010  18461.064858  19017.174590  18353.059722  ...   

           2008          2009          2010          2011          2012  \
0  27084.703690  24630.453714  23512.602596  24985.993281  24713.698045   
1    364.660465    438.076034    543.303042    591.162346    641.872034   
2   4080.941410   3122.780766   3587.883798   4615.468028   5100.095808   
3   4370.540087   4114.140150   4094.362119   4437.178067   4247.614279   
4  47785.089273  43338.866758  39736.354063  41100.729938  38392.943901   

           2013          2014          2015          2016          2017  
0  25025.099563  25533.569780  25796.380251  25239.600411  25630.266492  
1    637.165044    613.856333    578.466353    547.228110    556.302139  
2   5254.882338   5408.410496   4166.979684   3506.072885   4095.812942  
3   4413.081743   4578.666720   3952.829458   4124.108907   4532.890162  
4  40626.751632  42300.334128  36039.653496  37224.108916  39134.393371  

[5 rows x 30 columns]
Basic statistics about GDP/capita data: 
                1990          1991          1992          1993          1994  \
count    219.000000    218.000000    218.000000    223.000000    225.000000   
mean    5981.261379   6103.918147   6420.623438   6137.273522   6456.077882   
std     9990.346732  10170.161411  10922.576371  10378.983969  11077.114251   
min       95.188250    138.447454    139.200137    157.060818    121.264128   
25%      576.302611    600.536495    521.540638    529.212361    546.203203   
50%     1569.739433   1578.372750   1588.112443   1590.095882   1686.846001   
75%     7218.969985   6939.020276   7169.585191   6608.031685   7092.237948   
max    84289.559544  83738.354552  91647.982647  85399.058680  89380.574775   

                1995           1996          1997          1998          1999  \
count     234.000000     234.000000    234.000000    236.000000    237.000000   
mean     7216.826359    7441.954792   7314.548000   7532.615295   7868.314498   
std     12539.600734   12707.697295  11943.499312  12358.583803  12911.272692   
min       134.342956     134.981924    138.974046    125.076141    102.597978   
25%       610.138585     614.487362    651.704239    619.705209    635.766242   
50%      2065.027229    2127.262005   2075.044961   1978.227328   1925.690866   
75%      7243.763956    7498.674236   7992.308773   8136.369079   9322.626714   
max    101910.108580  101237.257656  90798.664354  92995.906360  91257.672016   

       ...           2008           2009           2010           2011  \
count  ...     251.000000     251.000000     251.000000     252.000000   
mean   ...   15727.346214   14014.186451   14717.713113   16194.601132   
std    ...   24526.383036   21447.562784   22012.960202   24210.620724   
min    ...     198.352901     212.136880     234.235647     249.577979   
25%    ...    1475.569828    1360.792080    1578.201816    1752.007345   
50%    ...    5090.932345    4950.294791    5555.390949    6092.607440   
75%    ...   19377.775250   15999.640889   16705.323353   19312.208501   
max    ...  185721.794154  154762.199427  150585.448911  168785.940809   

                2012           2013           2014           2015  \
count     251.000000     253.000000     252.000000     251.000000   
mean    16114.360342   16593.630103   16600.528105   14967.644875   
std     23412.688132   24937.784367   25317.143749   22679.171192   
min       252.358980     256.976003     274.857948     293.455236   
25%      1912.362729    2004.504298    2104.206968    2067.475587   
50%      6586.721279    6832.456891    6640.856256    6124.491643   
75%     19443.643774   19916.019387   19462.312835   17106.400142   
max    157515.899069  177593.351895  189170.895671  167290.939984   

                2016           2017  
count     250.000000     247.000000  
mean    15093.313512   14920.125410  
std     22859.259450   21320.002347  
min       282.193130     292.997631  
25%      2124.675302    2042.465642  
50%      5924.917489    6213.501276  
75%     17821.571228   17136.270746  
max    169915.804840  167101.759377  

[8 rows x 28 columns]
Number of Null values by column 
 CountryName     0
CountryCode     0
1990           45
1991           46
1992           46
1993           41
1994           39
1995           30
1996           30
1997           30
1998           28
1999           27
2000           20
2001           20
2002           15
2003           15
2004           14
2005           13
2006           13
2007           12
2008           13
2009           13
2010           13
2011           12
2012           13
2013           11
2014           12
2015           13
2016           14
2017           17
dtype: int64
         CountryName            avg
147           Monaco  125432.680483
135    Liechtenstein  105002.775493
50    Cayman Islands   75463.629022
142       Luxembourg   75312.614917
25           Bermuda   61284.313495
175           Norway   59471.834794
210       San Marino   59009.555365
35       Switzerland   57601.354937
36   Channel Islands   51427.189627
106      Isle of Man   48600.155219

Dataset 4: Unemployment data

Unemployment total % https://data.worldbank.org/indicator/SL.UEM.TOTL.ZS

To have a better understanding let's describe the attributes:

  • Country: Name of country
  • CountryCode: 3 letter code of country
  • 1990 - 2019: percentage of unemployment in given country in given year
In [6]:
unemp = pd.read_excel("./Data/unemployment.xls")
unemp.rename(columns={"Country Name":"CountryName","Country Code":"CountryCode"},inplace=True)
unemp.drop(unemp.columns[[range(2,34)]], axis=1,inplace=True)
unemp.drop("2018", axis=1, inplace=True)
unemp.drop("2019", axis=1, inplace=True)

unemp_head  = unemp.head(5)
unemp_stats = unemp.loc[:,[str(e) for e in list(range(1990,2018))]]

print("Head of Unemployment data: \n",unemp_head)
print("Basic statistics about Unemployment data: \n", unemp_stats.describe())
print("Number of Null values by column \n", unemp.isnull().sum(axis=0))

unemp_dropped = unemp.dropna(how='all')
unemp_dropped.hist(figsize=(35,30),
                  bins=20)
plt.tight_layout()
plt.show()

unemp_dropped['avg'] = unemp_dropped.mean(axis=1)
print(unemp_dropped.loc[:,['CountryName','avg']].nlargest(10,'avg'))
Head of Unemployment data: 
    CountryName CountryCode  1990       1991       1992    1993    1994  \
0        Aruba         ABW   NaN        NaN        NaN     NaN     NaN   
1  Afghanistan         AFG   NaN   2.976000   3.173000   3.463   3.612   
2       Angola         AGO   NaN  22.601999  20.924999  21.250  21.159   
3      Albania         ALB   NaN  16.781000  17.653000  17.681  17.527   
4      Andorra         AND   NaN        NaN        NaN     NaN     NaN   

        1995    1996       1997  ...    2008    2009    2010    2011    2012  \
0        NaN     NaN        NaN  ...     NaN     NaN     NaN     NaN     NaN   
1   3.653000   3.621   3.603000  ...   2.494   2.470   2.275   1.984   1.692   
2  21.148001  20.066  21.465000  ...  12.044  10.609   9.089   7.362   7.359   
3  17.607000  18.358  18.311001  ...  13.060  13.674  14.086  13.481  13.376   
4        NaN     NaN        NaN  ...     NaN     NaN     NaN     NaN     NaN   

     2013    2014    2015    2016    2017  
0     NaN     NaN     NaN     NaN     NaN  
1   1.725   1.735   1.679   1.634   1.559  
2   7.454   7.429   7.279   7.281   7.139  
3  15.866  17.490  17.080  15.220  13.750  
4     NaN     NaN     NaN     NaN     NaN  

[5 rows x 30 columns]
Basic statistics about Unemployment data: 
        1990        1991        1992        1993        1994        1995  \
count   0.0  233.000000  233.000000  233.000000  233.000000  233.000000   
mean    NaN    7.243633    7.477937    8.072329    8.281505    8.414185   
std     NaN    6.072201    6.125838    6.279556    6.227485    6.256061   
min     NaN    0.300000    0.334000    0.401000    0.474000    0.526000   
25%     NaN    2.698000    3.096000    3.590718    3.755140    3.892070   
50%     NaN    5.440000    5.910000    6.237014    6.750000    7.182000   
75%     NaN   10.102000   10.205000   10.920000   11.100000   11.040000   
max     NaN   36.126999   36.238998   36.637001   37.265999   37.075001   

             1996        1997        1998        1999  ...        2008  \
count  233.000000  233.000000  233.000000  233.000000  ...  233.000000   
mean     8.528492    8.461068    8.482511    8.678133  ...    6.892261   
std      6.228282    6.146730    6.034958    6.051520  ...    5.053773   
min      0.560000    0.617000    0.652000    0.700000  ...    0.310000   
25%      3.998000    4.031000    4.065000    4.308000  ...    3.698000   
50%      7.300000    7.186000    7.307000    7.113277  ...    5.847892   
75%     11.500000   11.194000   11.786954   11.994570  ...    8.348000   
max     37.126999   37.939999   37.167000   36.812000  ...   33.761002   

             2009        2010        2011        2012        2013        2014  \
count  233.000000  233.000000  233.000000  233.000000  233.000000  233.000000   
mean     7.824932    7.963419    7.892899    7.903058    7.916960    7.709996   
std      5.247146    5.495426    5.514358    5.634997    5.657360    5.553007   
min      0.310000    0.450000    0.319000    0.332000    0.270000    0.190000   
25%      4.197000    4.169257    4.110000    4.048000    4.096000    4.032079   
50%      6.662000    6.926000    6.510097    6.412558    6.337476    6.130268   
75%      9.622000   10.279000   10.200000   10.290076   10.211820   10.206868   
max     32.179001   32.020000   31.378000   31.016001   28.996000   28.030001   

             2015        2016        2017  
count  233.000000  233.000000  233.000000  
mean     7.588368    7.443758    7.134166  
std      5.391255    5.225532    5.068527  
min      0.160000    0.140000    0.140000  
25%      4.057183    4.010240    3.873328  
50%      6.168000    6.161000    5.760000  
75%      9.873000    9.670000    9.292000  
max     27.650000   26.889999   27.444000  

[8 rows x 28 columns]
Number of Null values by column 
 CountryName      0
CountryCode      0
1990           264
1991            31
1992            31
1993            31
1994            31
1995            31
1996            31
1997            31
1998            31
1999            31
2000            31
2001            31
2002            31
2003            31
2004            31
2005            31
2006            31
2007            31
2008            31
2009            31
2010            31
2011            31
2012            31
2013            31
2014            31
2015            31
2016            31
2017            31
dtype: int64
                        CountryName        avg
155                 North Macedonia  32.676260
139                         Lesotho  31.541444
261                    South Africa  27.949074
22           Bosnia and Herzegovina  25.213963
160                      Montenegro  25.151741
222                        Eswatini  24.670185
194              West Bank and Gaza  21.011111
169                         Namibia  20.871296
251  St. Vincent and the Grenadines  20.133593
58                          Algeria  18.951333

Dataset 5: Life Expectancy data

Life Expectancy at birth in Years https://data.worldbank.org/indicator/SP.DYN.LE00.IN

To have a better understanding let's describe the attributes:

  • Country: Name of country
  • CountryCode: 3 letter code of country
  • 1990 - 2019: life expectancy at birth in years
In [7]:
life = pd.read_excel("./Data/life.xls")
life.rename(columns={"Country Name":"CountryName","Country Code":"CountryCode"},inplace=True)
life.drop(life.columns[[range(2,34)]], axis=1,inplace=True)
life.drop("2018", axis=1, inplace=True)
life.drop("2019", axis=1, inplace=True)

life_head  = life.head(5)
life_stats = life.loc[:,[str(e) for e in list(range(1990,2018))]]

print("Head of Life data: \n",life_head)
print("Basic statistics about Life data: \n", life_stats.describe())
print("Number of Null values by column \n", life.isnull().sum(axis=0))

life_dropped = life.dropna(how='all')
life_dropped.hist(figsize=(35,30),
                  bins=20)
plt.tight_layout()
plt.show()


life_dropped['avg'] = life_dropped.mean(axis=1)
print(life_dropped.loc[:,['CountryName','avg']].nlargest(10,'avg'))
Head of Life data: 
    CountryName CountryCode    1990    1991    1992    1993    1994    1995  \
0        Aruba         ABW  73.468  73.509  73.544  73.573  73.598  73.622   
1  Afghanistan         AFG  50.331  50.999  51.641  52.256  52.842  53.398   
2       Angola         AGO  45.306  45.271  45.230  45.201  45.201  45.246   
3      Albania         ALB  71.836  71.803  71.802  71.860  71.992  72.205   
4      Andorra         AND     NaN     NaN     NaN     NaN     NaN     NaN   

     1996    1997  ...    2008    2009    2010    2011    2012    2013  \
0  73.646  73.671  ...  74.725  74.872  75.017  75.158  75.299  75.441   
1  53.924  54.424  ...  59.930  60.484  61.028  61.553  62.054  62.525   
2  45.350  45.519  ...  53.243  54.311  55.350  56.330  57.236  58.054   
3  72.495  72.838  ...  75.912  76.221  76.562  76.914  77.252  77.554   
4     NaN     NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN   

     2014    2015    2016    2017  
0  75.583  75.725  75.868  76.010  
1  62.966  63.377  63.763  64.130  
2  58.776  59.398  59.925  60.379  
3  77.813  78.025  78.194  78.333  
4     NaN     NaN     NaN     NaN  

[5 rows x 30 columns]
Basic statistics about Life data: 
              1990        1991        1992        1993        1994        1995  \
count  243.000000  244.000000  244.000000  242.000000  244.000000  244.000000   
mean    64.733739   64.954614   65.083176   65.133274   65.391410   65.534107   
std      9.498222    9.589702    9.651244    9.694110    9.754469    9.708618   
min     33.413000   29.248000   26.691000   26.172000   27.738000   31.037000   
25%     58.122400   58.221000   58.170750   58.063500   58.485000   58.574250   
50%     67.860414   67.965854   67.689865   67.657500   67.909500   68.286016   
75%     71.557000   71.805500   71.818531   71.929750   72.338250   72.511695   
max     78.836829   79.100732   79.153902   79.293659   79.687073   79.536341   

             1996        1997        1998        1999  ...        2008  \
count  243.000000  246.000000  243.000000  244.000000  ...  247.000000   
mean    65.745921   66.053051   66.209569   66.437586  ...   69.430214   
std      9.738483    9.724062    9.778984    9.771970  ...    8.907527   
min     35.380000   37.496000   37.980000   38.634000  ...   43.384000   
25%     58.987500   59.431000   59.191500   59.655000  ...   63.978000   
50%     68.695000   68.897000   68.951220   69.187000  ...   71.812195   
75%     72.748329   73.015750   73.440500   73.616500  ...   75.422378   
max     80.200244   80.424146   80.501463   80.570732  ...   82.682927   

             2009        2010        2011        2012        2013        2014  \
count  247.000000  246.000000  246.000000  248.000000  246.000000  246.000000   
mean    69.824634   70.193102   70.567857   70.973960   71.240128   71.568042   
std      8.699224    8.528535    8.337266    8.148554    7.948926    7.808928   
min     44.146000   45.100000   46.207000   47.416000   48.663000   49.891000   
25%     64.383000   64.547637   65.155981   65.816615   66.142500   66.498500   
50%     72.067000   72.051500   72.236934   72.473500   72.611155   72.866264   
75%     75.689061   76.007174   76.277067   76.630878   76.936295   77.079787   
max     82.931463   82.978049   83.421951   85.417073   83.831707   83.980488   

             2015        2016        2017  
count  246.000000  245.000000  245.000000  
mean    71.808315   72.090487   72.321203  
std      7.606823    7.508556    7.381138  
min     50.881000   51.593000   52.240000  
25%     67.150000   67.175000   67.380000  
50%     73.119124   73.395956   73.554000  
75%     77.242928   77.470000   77.632000  
max     84.278049   84.226829   84.680488  

[8 rows x 28 columns]
Number of Null values by column 
 CountryName     0
CountryCode     0
1990           21
1991           20
1992           20
1993           22
1994           20
1995           20
1996           21
1997           18
1998           21
1999           20
2000           17
2001           19
2002           16
2003           19
2004           19
2005           18
2006           18
2007           17
2008           17
2009           17
2010           18
2011           18
2012           16
2013           18
2014           18
2015           18
2016           19
2017           19
dtype: int64
              CountryName        avg
210            San Marino  85.417073
50         Cayman Islands  82.190244
117                 Japan  81.598336
94   Hong Kong SAR, China  81.348258
144      Macao SAR, China  81.033893
35            Switzerland  80.657439
112               Iceland  80.600984
114                 Italy  80.348780
135         Liechtenstein  80.318394
221                Sweden  80.217962

Dataset 6: GDP per capita growth data

GDP per capita growth % https://data.worldbank.org/indicator/NY.GDP.PCAP.KD.ZG

To have a better understanding let's describe the attributes:

  • Country: Name of country
  • CountryCode: 3 letter code of country
  • 1990 - 2019: GDP per capita growth in percentage
In [8]:
capita_growth = pd.read_excel("./Data/capita_growth.xls")
capita_growth.rename(columns={"Country Name":"CountryName","Country Code":"CountryCode"},inplace=True)
capita_growth.drop(capita_growth.columns[[range(2,34)]], axis=1,inplace=True)
capita_growth.drop("2018", axis=1, inplace=True)
capita_growth.drop("2019", axis=1, inplace=True)
capita_growth.head(5)

capita_growth_head  = capita_growth.head(5)
capita_growth_stats = capita_growth.loc[:,[str(e) for e in list(range(1990,2018))]]

print("Head of Capita Growth data: \n",capita_growth_head)
print("Basic statistics about Capita Growth data: \n", capita_growth_stats.describe())
print("Number of Null values by column \n", capita_growth.isnull().sum(axis=0))

capita_growth_dropped = capita_growth.dropna(how='all')
capita_growth_dropped.hist(figsize=(35,30),
                  bins=20)
plt.tight_layout()
plt.show()

capita_growth_dropped['avg'] = capita_growth_dropped.mean(axis=1)
print(capita_growth_dropped.loc[:,['CountryName','avg']].nlargest(10,'avg'))
Head of Capita Growth data: 
    CountryName CountryCode       1990       1991      1992       1993  \
0        Aruba         ABW   2.092910   3.831274  0.275949   0.989468   
1  Afghanistan         AFG        NaN        NaN       NaN        NaN   
2       Angola         AGO  -6.657532  -2.310860 -8.876967 -26.411770   
3      Albania         ALB -11.187905 -27.566821 -6.622551  10.229949   
4      Andorra         AND  -0.142615  -1.366130 -2.870545  -4.412623   

       1994       1995      1996       1997  ...      2008       2009  \
0  2.284430  -2.079505 -2.311930   4.226981  ... -0.224764 -10.605299   
1       NaN        NaN       NaN        NaN  ...  1.594211  18.515369   
2 -1.877816  11.359481  9.952817   3.877850  ...  7.116873  -2.808634   
3  8.969762  14.024496  9.780180 -10.361105  ...  8.328036   4.048889   
4 -0.403572   0.869722  3.820477   9.123624  ... -9.874030  -4.375947   

        2010      2011      2012      2013      2014      2015      2016  \
0  -3.887760  3.063882 -1.864168  3.593198  0.250567 -0.991548 -0.716487   
1  11.264133 -2.681081  8.974880  1.974169 -0.665271 -1.622887 -0.541697   
2   1.079169 -0.220847  4.706459  1.292086  1.219833 -2.468715 -5.816237   
3   4.223037  2.821559  1.585155  1.187205  1.985426  2.516852  3.480117   
4  -5.343136 -3.847253 -0.039668  2.405292  4.293204  2.395989  2.830284   

       2017  
0  0.855431  
1  0.082079  
2 -3.409903  
3  3.916612  
4  2.115060  

[5 rows x 30 columns]
Basic statistics about Capita Growth data: 
              1990        1991        1992        1993        1994        1995  \
count  207.000000  217.000000  218.000000  220.000000  222.000000  225.000000   
mean     1.586155   -0.456900   -0.204424    0.118577    0.546748    2.231781   
std      6.410946    7.438532    7.880799    6.189178    7.029085    4.971100   
min    -14.765157  -64.992373  -45.325107  -29.841290  -47.503316  -13.717480   
25%     -1.590441   -2.262144   -2.456523   -1.855973   -1.167752    0.510230   
50%      1.315112    0.092618    0.569948    0.626380    1.554461    2.170550   
75%      3.697548    2.442359    3.433583    3.033777    3.629991    3.820568   
max     53.974878   43.377732   30.403196   26.493187   22.324979   37.535528   

             1996        1997        1998        1999  ...        2008  \
count  233.000000  233.000000  236.000000  238.000000  ...  249.000000   
mean     3.188479    3.383361    2.106239    1.816957  ...    2.366015   
std      7.869292   10.008220    4.805939    4.105126  ...    4.362140   
min    -17.932055  -12.778706  -29.461593  -11.828002  ...  -18.491136   
25%      1.017186    1.024652    0.401867   -0.433113  ...    0.315908   
50%      2.363224    2.686307    2.064746    1.823883  ...    2.268811   
75%      4.394507    4.372417    4.026052    3.723562  ...    4.612470   
max     92.201557  140.370770   30.742349   20.635900  ...   33.753378   

             2009        2010        2011        2012        2013        2014  \
count  250.000000  250.000000  250.000000  250.000000  250.000000  250.000000   
mean    -1.413904    2.951821    2.288967    2.117726    1.879337    1.858002   
std      4.889829    3.865572    5.759400    8.930456    4.409531    3.810196   
min    -16.873719  -12.981689  -62.378077  -47.590580  -36.556820  -27.234410   
25%     -4.350620    1.331174    0.762287   -0.015134    0.194669    0.554218   
50%     -1.334482    2.995555    2.564913    1.662196    1.900270    1.849430   
75%      1.531165    4.745888    4.739066    3.732839    3.677982    3.433539   
max     18.515369   22.512786   18.886635  121.779543   27.492450   24.637414   

             2015        2016        2017  
count  250.000000  249.000000  249.000000  
mean     1.439939    1.599612    1.832682  
std      4.317331    3.711398    3.740553  
min    -23.141193  -16.385328  -14.363688  
25%      0.044941    0.185060    0.395740  
50%      1.692211    1.878164    1.990240  
75%      3.395726    3.149324    3.477736  
max     23.985510   27.379347   24.971131  

[8 rows x 28 columns]
Number of Null values by column 
 CountryName     0
CountryCode     0
1990           57
1991           47
1992           46
1993           44
1994           42
1995           39
1996           31
1997           31
1998           28
1999           26
2000           25
2001           20
2002           19
2003           14
2004           14
2005           14
2006           14
2007           14
2008           15
2009           14
2010           14
2011           14
2012           14
2013           14
2014           14
2015           14
2016           15
2017           15
dtype: int64
                                     CountryName        avg
86                             Equatorial Guinea  14.383944
177                                        Nauru  13.441598
22                        Bosnia and Herzegovina  10.811663
38                                         China   8.698387
158                                      Myanmar   7.664357
59   East Asia & Pacific (excluding high income)   7.063876
228   East Asia & Pacific (IDA & IBRD countries)   7.062827
235                                  Timor-Leste   6.199368
111                                         Iraq   5.683611
141                                    Lithuania   5.519613

Data Cleaning / Merging / etc

World Bank Data

  • Interpolating Missing Values
  • Rounding Values where fitting
  • Calculating Average for each country (Used in plotting)
  • Filtering for Countries that are available in Terrorism dataset

Terrorism Data

  • Around 9.000 terror attacks cannot be assigned to a country with gdp (International)
  • 153 Countries of the GDP dataset have had a terror attack between 1991 and 2018
  • Mapping Country Names of Terrorism Database to World Bank Names
  • Performing ISO-Lookup for the countries (ISO-3)
  • Exclude Big Attacks with over 350 Deaths as outliers
  • Grouping terrorist attacks by country and by country+year
  • No Data for Year 1993 (and 2018) -> interpolation for 1993

Combination of Datasets

  • Generation of new Dataframe
  • Country Information: Name + Code
  • GDP Information: Log_Scaled, Standardized, Change%, Small Country (=0,25Quantile)
  • Capita Information: Standardized, Change%
  • Unemployment
  • Life Expectancy
  • Terroristic Information
  • Generation of Information about past terrorist activity in countries (Last_Year, Last_3_Years, Last_5_Years)

Economical and Social Data is added from the year before. For the prediction part this will not be able for the target year, as the prediction would happen at the start of a year. Therefore data of the previous year is needed.

In [9]:
# Compute the average values (later used in plotting) over the years 1992-2018
gdp["Average"] = gdp[gdp.keys()[2:]].mean(axis=1)
capita["Average"] = capita[capita.keys()[2:]].mean(axis=1)
unemp["Average"] = unemp[unemp.keys()[2:]].mean(axis=1)
life["Average"] = life[life.keys()[2:]].mean(axis=1)
capita_growth["Average"] = capita_growth[capita_growth.keys()[2:]].mean(axis=1)
In [10]:
#round the world bank data values
gdp = round(gdp)
capita = round(capita,3)
unemp = round(unemp,3)
life = round(life,3)
capita_growth = round(capita_growth,3)
In [11]:
# Interpolate missing values row-wise (by country)
gdp[gdp.keys()[2:30]] = gdp[gdp.keys()[2:30]].interpolate(method="linear",limit_direction='both',axis=1)
capita[capita.keys()[2:30]] = capita[capita.keys()[2:30]].interpolate(method="linear",limit_direction='both',axis=1)
unemp[unemp.keys()[2:30]] = unemp[unemp.keys()[2:30]].interpolate(method="linear",limit_direction='both',axis=1)
life[life.keys()[2:30]] = life[life.keys()[2:30]].interpolate(method="linear",limit_direction='both',axis=1)
capita_growth[capita_growth.keys()[2:30]] = capita_growth[capita_growth.keys()[2:30]].interpolate(method="linear",limit_direction='both',axis=1)
In [12]:
# Disregard "big" terrorist attacks with 350 or more killed. These datapoints are
# outliers and therefore not reasonable for analysis.
# Example 9/11 or Omu Shinrikyo (Tokio Subway Poison Attack)
terror = terror[terror.Killed<350]


# Mapping of Countries to official UN Names
terror = terror.replace({'Country': {"Russia":"Russian Federation",
                                     "Czech Republic":"Czechia",
                                     "Slovak Republic":"Slovakia",
                                     "Iran":"Iran, Islamic Republic of",
                                     "Syria":"Syrian Arab Republic",
                                     "Venezuela":"Venezuela, Bolivarian Republic of",
                                     "North Korea":"Korea, Democratic People's Republic of",
                                     "South Korea":"Korea, Republic of",
                                     "Democratic Republic of the Congo":"Congo, The Democratic Republic of the",
                                     "Zaire":"Congo, The Democratic Republic of the",
                                     "Republic of the Congo":"Congo",
                                     "Bolivia":"Bolivia, Plurinational State of",
                                     "Bosnia-Herzegovina":"Bosnia and Herzegovina",
                                     "Brunei":"Brunei Darussalam",
                                     "East Timor":"Timor-Leste",
                                     "Ivory Coast":"Côte d'Ivoire",
                                     "Tanzania":"Tanzania, United Republic of",
                                     "Vietnam":"Viet Nam",
                                     "Taiwan":"Taiwan, Province of China",
                                     "Macedonia":"North Macedonia",
                                     "Moldova":"Moldova, Republic of",
                                     "Laos":"Lao People's Democratic Republic",
                                     "Serbia-Montenegro":"Montenegro",
                                     "Swaziland":"Eswatini",
                                     "Macau":"Macao",
                                     "St. Kitts and Nevis":"Saint Kitts and Nevis",
                                     "St. Lucia":"Saint Lucia",
                                     "West Bank and Gaza Strip":"Palestine, State of"}})



terror = terror.replace("Kosovo","Serbia") #not available in the UN-Data.
terror = terror.replace("Yugoslavia","Serbia") #since 2006 official successor
terror = terror.replace("Czechoslovakia","Czechia") #Czechia as the bigger successor of Czechoslovakia
terror = terror.replace("East Germany (GDR)","Germany") #Already united in 1992
terror = terror.replace("West Germany (FRG)","Germany") #Already united in 1992
terror = terror.replace("Soviet Union","Russian Federation") #Already fell apart in 1992

print(len(terror)-sum(terror.Country.isin(gdp.CountryName).astype(int)))
# Around 9.000 Terror attacks cannot be assigned to a country with gdp
print(sum(gdp.CountryName.isin(terror.Country).astype(int)))
#154 Countries from gdp are also in the terrorism dataset.
#Numbers might change further depending on preprocessing
8857
153
In [13]:
#Grouping the dataset by Attacks and Casualities
terrorgroup = pd.DataFrame()
terrorgroup[["Attacks","Casualities"]] = terror.groupby("Country").agg({'Country':'count', 'Casualities': 'sum'})
In [14]:
# ISO lookup
input_countries = terrorgroup.index.values

countries = {}
for country in pycountry.countries:
    countries[country.name] = country.alpha_3

codes = [countries.get(country, 'Unknown code') for country in input_countries]

# Adding to dataframe
terrorgroup["ISO"]=codes
terrorgroup["Country"]=terrorgroup.index.values
In [15]:
# Generating datasets for attacks and casualities
# Split up by Year (for every country for every year)
# Data from 1993 is missing. -> backward interpolation
terroryear = pd.DataFrame()
terroryear[["Attacks","Casualities"]] = terror.groupby(["Country","Year"]).agg({'Country':'count', 'Casualities': 'sum'})
terroryear.reset_index(inplace=True)
terroryear["ISO"]=[countries.get(country, 'Unknown code') for country in terroryear.Country]

test = pd.DataFrame()
test["Country"]=gdp.CountryName.unique()
test["ISO"]=gdp.CountryCode.unique()

for i in range(1991,2018):
    test[str(i)]=0
    df=terroryear[terroryear.Year==i]
    for j in df.Country.unique():
        test.loc[test.Country==j, [str(i)]] = df.loc[df.Country==j,"Attacks"].unique()[0]

test["1993"]=np.nan
test[test.keys()[2:29]] = round(test[test.keys()[2:29]].interpolate(method="linear",limit_direction='backward',axis=1,limit_area="inside"))
test[test.keys()[2:29]] = test[test.keys()[2:29]].astype('int32')
terrorattacks = test


test = pd.DataFrame()
test["Country"]=gdp.CountryName.unique()
test["ISO"]=gdp.CountryCode.unique()
for i in range(1991,2018):
    test[str(i)]=0
    df=terroryear[terroryear.Year==i]
    for j in df.Country.unique():
        #if(df.loc[df.Country==j,"Casualities"].unique()[0]>3000):
            #test.loc[test.Country==j, [str(i)]]=np.nan
       # else:
        test.loc[test.Country==j, [str(i)]] = df.loc[df.Country==j,"Casualities"].unique()[0]

test["1993"]=np.nan
test[test.keys()[2:29]] = round(test[test.keys()[2:29]].interpolate(method="linear",limit_direction='both',axis=1))
test[test.keys()[2:29]] = test[test.keys()[2:29]].astype('int32')
terrorkilled = test
In [16]:
# Drop Countries where no Terrorist attacks are available
# Focus on countries that are available in both datasets.
unemp = unemp.fillna(unemp.mean()) #Impute with mean for "Antigua and Barbuda" and "Dominica"
gdp_terror = gdp[terrorattacks.sum(axis=1)!=0]
capita_terror = capita[terrorattacks.sum(axis=1)!=0]
unemp_terror = unemp[terrorattacks.sum(axis=1)!=0]
life_terror = life[terrorattacks.sum(axis=1)!=0]
growth_terror = capita_growth[terrorattacks.sum(axis=1)!=0]

terrorkilled = terrorkilled[terrorattacks.sum(axis=1)!=0]
terrorattacks = terrorattacks[terrorattacks.sum(axis=1)!=0]
In [17]:
# Generating a dataset for model building.
# Include different measures for GDP (log,normalized,change in %)
# Calculate values for previous terrorist attacks
test=pd.DataFrame()
for i in range(1992,2018):
    empty=pd.DataFrame()
    empty["Country"]=gdp_terror.CountryName
    empty["Code"]=gdp_terror.CountryCode
    empty["Year"]=i
    empty["Small_Country"]=(gdp_terror[str(i-1)]<=np.quantile(gdp_terror[str(i-1)],0.25))
    empty["GDP_log"]=np.log(gdp_terror[str(i-1)])
    empty["GDP_normal"]=(gdp_terror[str(i-1)]-gdp_terror[str(i-1)].mean())/gdp_terror[str(i-1)].std()
    empty["GDP_change"]=(gdp_terror[str(i-1)]-gdp_terror[str(i-2)])/gdp_terror[str(i-2)]
    empty["GDP_per_Capita"]=capita_terror[str(i-1)]/capita_terror[str(i-1)].max()
    empty["GDP_per_Capita_Change"]=growth_terror[str(i-1)]
    empty["Unemployment"]=unemp_terror[str(i-1)]
    empty["Life_Expectancy"]=life_terror[str(i-1)]
    empty["Terrorist_Attacks"]=terrorattacks[str(i)]
    empty["Casualities"]=terrorkilled[str(i)]
    if(i<=1992):
        empty["Trend"]=0
    else:
        x=terrorattacks[str(i-1)]-terrorattacks[str(i-2)]
        x.reset_index(drop=True,inplace=True)
        for j in range(0,len(x)):
            x[x[j]>0]=1
            x[x[j]<0]=-1
        empty["Trend"]=list(x)

        
    if(i<=1993):
        if(i==1991):
            empty["Attacks_last_3_Years"] = terrorattacks[str(i)]
            empty["Casualities_last_3_Years"] = terrorkilled[str(i)]
        else:
            empty["Attacks_last_3_Years"] = terrorattacks[(str(i) for i in list(range(1991,i)))].mean(axis=1)
            empty["Casualities_last_3_Years"] = terrorkilled[(str(i) for i in list(range(1991,i)))].mean(axis=1)
    else:
        empty["Attacks_last_3_Years"] = terrorattacks[(str(i) for i in list(range(i-3,i)))].mean(axis=1)
        empty["Casualities_last_3_Years"] = terrorkilled[(str(i) for i in list(range(i-3,i)))].mean(axis=1)

        
    if(i<=1995):
        if(i<=1991):
            empty["Attacks_last_5_Years"] = terrorattacks[str(i)]
            empty["Casualities_last_5_Years"] = terrorkilled[str(i)]
        else:
            empty["Attacks_last_5_Years"] = terrorattacks[(str(i) for i in list(range(1991,i)))].mean(axis=1)
            empty["Casualities_last_5_Years"] = terrorkilled[(str(i) for i in list(range(1991,i)))].mean(axis=1)
    else:
        empty["Attacks_last_5_Years"] = terrorattacks[(str(i) for i in list(range(i-5,i)))].mean(axis=1)
        empty["Casualities_last_5_Years"] = terrorkilled[(str(i) for i in list(range(i-5,i)))].mean(axis=1)
    
    
    if(i==1991):
        empty["Attacks_last_Year"]=round(terrorattacks[str(i)])
        empty["Casualities_last_Year"]=round(terrorkilled[str(i)])
    else:
        empty["Attacks_last_Year"]=terrorattacks[str(i-1)]
        empty["Casualities_last_Year"]=terrorkilled[str(i-1)]


    empty["High_Terrorism"]=(empty["Casualities_last_5_Years"]>100)
    empty["Medium_Terrorism"]=(empty.Casualities_last_5_Years<=100)&(empty.Casualities_last_5_Years>5)
    empty["Low_Terrorism"]=(empty.Casualities_last_5_Years<=5)
    
    
    if(len(test)==0):
        test=empty.copy()
    else:
        test = test.append(empty,ignore_index=True)
fulldata = test.copy()

Hypotheses

Correlations

  • Does unemployment influence terrorism within a country?
  • Does GDP per capita influence terrorism within a country?

For both Hypotheses we compare the average values of terrorism in a country with the repsective average values of Unemployment or GDP per capita. Scatterplots for both Hypotheses are in Section "Data Visualization"

For Unemployment we calculate a correlation of -0.04 and for Unemployment -0.1. In both cases these correlation values fall below the absolute 0.3 threshold that we set in the beginning. Therefore we keep the Hypotheses, that terrorism is not correlated with these variables.

However, a combination of these variables could still influence terrorism. Therefore we will examine which of our variables have an influence on terrorism in the next step. We will try to build models and predict terrorism with our data in the next step.

Predictions

  • Which economical and social influences have an influence terrorism?
  • What data is necessary to predict terrorism in the next year?

We use prediction models to answer these questions. We obtain the Regression Coefficient from Ridge Regression aswell as the Variable Importance from Random Forest to make a decision which variables have an influence on terrorism.

Prediction with

  • Ridge Regression (alpha=1)
  • Random Forest (n_estimators=100)

In a first step we use only economical and social data for the models. In a later step we will include previous terrorist data to see, if it improves our model and enables good predictions.

General Remarks:

  • Time-based Train-Test-Split (2018 used for test set)
  • Economical and Social Data has Importance [0,25] (Random Forest)
  • Ridge, Lasso and Regression have pretty equal performance
  • Regression Methods may predict less than 0 Casualities -> Cut to 0
  • Very bad performance from Regression Models (over 300 MAE)
  • Excluding heavy outliers (over 1000 casualities) results in high increase of performance for both estimators
  • Including Past Terrorist Data Improves the models
  • Best performance can be achieved with Random Forest by excluding outliers and including past terrorist data.
In [18]:
terrorkilled_avg = terrorkilled[terrorkilled.keys()[2:]].mean(axis=1)
np.corrcoef(unemp_terror.Average,terrorkilled_avg)
Out[18]:
array([[ 1.        , -0.04951075],
       [-0.04951075,  1.        ]])
In [19]:
# Correlation between Unemployment and Casualities
np.corrcoef(capita_terror.Average,terrorkilled_avg)
Out[19]:
array([[ 1.        , -0.10263671],
       [-0.10263671,  1.        ]])
In [20]:
#Selecting a subset of data (economic and social data) for the first model
test = fulldata#.loc[fulldata.Casualities<1000] #Possibility of Excluding high casualities
test = test.loc[test.Year!=1991]
predictors = ["Small_Country","GDP_per_Capita","GDP_log","GDP_normal","GDP_change",
              'Unemployment','Life_Expectancy',"GDP_per_Capita_Change"]
X = test[predictors]
Y = test["Casualities"]
In [21]:
# Alpha=1 performs best in RidgeCV
clf = RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1,10,100,1000],store_cv_values=True)
mod = clf.fit(X, Y)
np.mean(mod.cv_values_, axis=0)
Out[21]:
array([1170870.96859475, 1170870.73622281, 1170868.42827024,
       1170846.89308431, 1170758.29884713, 1172905.16536123,
       1178782.77615953])
In [22]:
# Time-based Train-Test-Split. Use 2018 for prediction
year = 2017
X_train = X.loc[test.Year<=year]
X_test = X.loc[test.Year==year]
Y_train = Y.loc[test.Year<=year]
Y_test = Y.loc[test.Year==year]


tree = RandomForestRegressor(n_estimators=100)
tree.fit(X_train, Y_train) 
#regr = LinearRegression()
regr = Ridge(alpha=1)
regr.fit(X_train, Y_train) 

#coeff = pd.DataFrame()
#coeff["Name"]=predictors
#coeff["Value"]=np.around(regr.coef_,2)
#coeff
Out[22]:
Ridge(alpha=1, copy_X=True, fit_intercept=True, max_iter=None, normalize=False,
      random_state=None, solver='auto', tol=0.001)
In [23]:
# Make predictions
# Random Forests perform very good. Ridge rather bad
pred_tree = tree.predict(X_test)
print("MAE Random Forest:", metrics.mean_absolute_error(pred_tree,Y_test))
print("RMSE Random Forest:", math.sqrt(metrics.mean_squared_error(pred_tree,Y_test)))

pred_regr = regr.predict(X_test)
pred_regr[pred_regr<0]=0
print("MAE Ridge Regression:", metrics.mean_absolute_error(pred_regr,Y_test))
print("RMSE Ridge Regression:", math.sqrt(metrics.mean_squared_error(pred_regr,Y_test)))
MAE Random Forest: 86.36169934640523
RMSE Random Forest: 369.8747721202143
MAE Ridge Regression: 398.761347410493
RMSE Ridge Regression: 1329.809425849711
In [24]:
# Get Regression Coefficient and Random Forest Importance for the variables
coef = pd.DataFrame()
coef["Name"]=predictors
coef["Coefficient"]=regr.coef_
coef["Importance"]=tree.feature_importances_
coef
Out[24]:
Name Coefficient Importance
0 Small_Country 2.904512 0.000410
1 GDP_per_Capita -694.440886 0.081373
2 GDP_log 102.900846 0.143455
3 GDP_normal -49.426143 0.157763
4 GDP_change -27.686703 0.105143
5 Unemployment -5.187424 0.221449
6 Life_Expectancy -6.783373 0.179840
7 GDP_per_Capita_Change 1.043740 0.110566
In [25]:
# Compare the predictions for 2018.
comp1 = pd.DataFrame()
comp1["Country"]=test.loc[test.Year==year,"Country"]
comp1["Casualities"]=Y_test
comp1["RandomForest"]=np.round(pred_tree)
comp1["Regression"]=np.round(pred_regr)

comp1
Out[25]:
Country Casualities RandomForest Regression
3825 Afghanistan 11698 9604.0 312.0
3826 Angola 416 311.0 454.0
3827 Albania 0 0.0 72.0
3828 United Arab Emirates 0 10.0 262.0
3829 Argentina 0 4.0 443.0
... ... ... ... ...
3973 Uzbekistan 0 6.0 381.0
3974 Vanuatu 0 0.0 0.0
3975 South Africa 28 317.0 429.0
3976 Zambia 0 3.0 295.0
3977 Zimbabwe 1 250.0 315.0

153 rows × 4 columns

In [26]:
plt.scatter(Y_test, pred_tree, linewidth=2)
plt.xlabel("Truth")
plt.ylabel("Prediction")
plt.xlim(-5, 1000)
plt.ylim(-5, 1000)
plt.show()

Prediction based on economical and social data does not work well with the regression. It fails especially for countries with high Casualities (Afghanistan) as it heavily underpredicts. A solution to this would be the usage of previous terrorism data. Random Forests already perform quite good on the data. We expect the performance to increase further by including previous data.

In [27]:
#
# Including previous terrostic activity
#

predictors = ["Small_Country","GDP_per_Capita","GDP_log","GDP_normal","GDP_change",
              'Unemployment','Life_Expectancy',"GDP_per_Capita_Change",
              "Casualities_last_Year","Casualities_last_5_Years","Trend",
              "High_Terrorism","Medium_Terrorism","Low_Terrorism"]
X2 = test[predictors]
Y2 = test["Casualities"]

year = 2017
X2_train = X2.loc[test.Year<=year]
X2_test = X2.loc[test.Year==year]
Y2_train = Y2.loc[test.Year<=year]
Y2_test = Y2.loc[test.Year==year]


tree2 = RandomForestRegressor(n_estimators=100)
tree2.fit(X2_train, Y2_train) 
#regr = LinearRegression()
regr2 = Ridge(alpha=1)
regr2.fit(X2_train, Y2_train)

pred_tree2 = tree2.predict(X2_test)
print("MAE Random Forest:", metrics.mean_absolute_error(pred_tree2,Y2_test))
print("RMSE Random Forest:", math.sqrt(metrics.mean_squared_error(pred_tree2,Y2_test)))


pred_regr2 = regr2.predict(X2_test)
pred_regr2[pred_regr2<0]=0
print("MAE Ridge Regression:", metrics.mean_absolute_error(pred_regr2,Y2_test))
print("RMSE Ridge Regression:", math.sqrt(metrics.mean_squared_error(pred_regr2,Y2_test)))

comp2 = pd.DataFrame()
comp2["Country"]=test.loc[test.Year==year,"Country"]
comp2["Casualities"]=Y2_test
comp2["RandomForest"]=np.round(pred_tree2)
comp2["Regression"]=np.round(pred_regr2)

comp2
MAE Random Forest: 59.68156862745096
RMSE Random Forest: 359.51120910647916
MAE Ridge Regression: 164.68671394462126
RMSE Ridge Regression: 1012.4845076642346
Out[27]:
Country Casualities RandomForest Regression
3825 Afghanistan 11698 12043.0 11420.0
3826 Angola 416 218.0 13.0
3827 Albania 0 0.0 0.0
3828 United Arab Emirates 0 0.0 8.0
3829 Argentina 0 1.0 14.0
... ... ... ... ...
3973 Uzbekistan 0 1.0 13.0
3974 Vanuatu 0 0.0 0.0
3975 South Africa 28 38.0 59.0
3976 Zambia 0 0.0 11.0
3977 Zimbabwe 1 1.0 17.0

153 rows × 4 columns

In [28]:
plt.scatter(Y2_test, pred_tree2, linewidth=2)
plt.xlabel("Truth")
plt.ylabel("Prediction")
plt.xlim(-5, 1000)
plt.ylim(-5, 1000)
plt.show()

Including previous terrorist data helps the Regression Model as it provides a starting point for countries with very high terrorist activity. It seems, that the level on terrorism is also heavily dependent on the previous activity of terrorism.

In the last try we exclude the countries with very high terrorist activity (>1000 Casualities per Year) and build two models. As we reduce the span of casualities to the range of 0-1000 we suspect that the mean average error will reduce further.

In [29]:
#
# Including previous terrostic activity, excluding high outliers
#

test2 = fulldata.loc[fulldata.Casualities<1000] #Possibility of Excluding high casualities
predictors = ["Small_Country","GDP_per_Capita","GDP_log","GDP_normal","GDP_change",
              'Unemployment','Life_Expectancy',"GDP_per_Capita_Change",
              "Casualities_last_Year","Casualities_last_5_Years","Trend",
              "High_Terrorism","Medium_Terrorism","Low_Terrorism"]
X3 = test2[predictors]
Y3 = test2["Casualities"]

year = 2017
X3_train = X3.loc[test2.Year<year]
X3_test = X3.loc[test2.Year==year]
Y3_train = Y3.loc[test2.Year<year]
Y3_test = Y3.loc[test2.Year==year]


tree3 = RandomForestRegressor(n_estimators=100)
tree3.fit(X3_train, Y3_train) 
#regr = LinearRegression()
regr3 = Ridge(alpha=1)
regr3.fit(X3_train, Y3_train)

pred_tree3 = tree3.predict(X3_test)
print("MAE Random Forest:", metrics.mean_absolute_error(pred_tree3,Y3_test))
print("RMSE Random Forest:", math.sqrt(metrics.mean_squared_error(pred_tree3,Y3_test)))

pred_regr3 = regr3.predict(X3_test)
pred_regr3[pred_regr3<0]=0
print("MAE Ridge Regression:", metrics.mean_absolute_error(pred_regr3,Y3_test))
print("RMSE Ridge Regression:", math.sqrt(metrics.mean_squared_error(pred_regr3,Y3_test)))

comp3 = pd.DataFrame()
comp3["Country"]=test2.loc[test.Year==year,"Country"]
comp3["Casualities"]=Y3_test
comp3["RandomForest"]=np.round(pred_tree3)
comp3["Regression"]=np.round(pred_regr3)

comp3
MAE Random Forest: 39.67930555555555
RMSE Random Forest: 91.20903531022692
MAE Ridge Regression: 43.18992178359494
RMSE Ridge Regression: 96.55486483410145
Out[29]:
Country Casualities RandomForest Regression
3826 Angola 416 2.0 24.0
3827 Albania 0 1.0 0.0
3828 United Arab Emirates 0 1.0 14.0
3829 Argentina 0 2.0 27.0
3830 Armenia 0 4.0 11.0
... ... ... ... ...
3973 Uzbekistan 0 2.0 19.0
3974 Vanuatu 0 0.0 0.0
3975 South Africa 28 54.0 61.0
3976 Zambia 0 2.0 12.0
3977 Zimbabwe 1 0.0 14.0

144 rows × 4 columns

In [30]:
plt.scatter(Y3_test, pred_tree3, linewidth=2)
plt.xlabel("Truth")
plt.ylabel("Prediction")
plt.xlim(-5, 1000)
plt.ylim(-5, 1000)
plt.show()

Lots of countries in the dataset have not experienced Casualities within 2018. The Random Forest Model manages to correctly identify these countries while also providing reasonable predictions for countries with actual Casualities. Overall, the Random Forest Model is performing very good based on MAE.

Data Visualization

Contains the following plots:

  • Correlation Plots to Hypotheses
  • Terrorism world plot
  • Terrorism EU Plot
  • GDP per capita world plot
In [31]:
df = pd.DataFrame()
df["Year"]=capita_terror.keys()[2:29]
df["GDP per capita"]=capita_terror[capita_terror.keys()[2:29]].mean().values
df["Casualities"]=terrorkilled[terrorkilled.keys()[2:29]].sum().values
fig = px.scatter(df, x="GDP per capita", y="Casualities",title="GDP per capita vs. Casualities")
fig.show()
In [32]:
df = pd.DataFrame()
df["Year"]=unemp_terror.keys()[2:29]
df["Unemployment_Rate"]=unemp_terror[unemp_terror.keys()[2:29]].mean().values
df["Casualities"]=terrorkilled[terrorkilled.keys()[2:29]].sum().values
fig = px.scatter(df, x="Unemployment_Rate", y="Casualities",title="Unemployment Rate vs. Casualities")
fig.show()
In [33]:
bins = [0, 100, 500, 1000, 4000, 8000, 200000]
labels = ["1","2","3","4","5","6"]
terrorplot = terrorgroup.copy()
terrorplot["Attack_Group"] = pd.cut(terrorplot["Attacks"], bins=bins, labels=labels)
fig = px.choropleth(terrorplot, locations="ISO",
                    color="Attack_Group", # lifeExp is a column of gapminder
                    hover_name="Country",
                    hover_data=["ISO","Attacks"],
                    title="Terrorist attacks between 1991 and 2017",# column to add to hover information
                    color_continuous_scale=px.colors.sequential.Reds)
fig.show()
In [34]:
bins = [0, 100, 500, 1000, 5000, 15000, 200000]
labels = ["1","2","3","4","5","6"]
terrorplot = terrorgroup.copy()
terrorplot["Attack_Group"] = pd.cut(terrorplot["Casualities"], bins=bins, labels=labels)
fig = px.choropleth(terrorplot, locations="ISO",
                    color="Attack_Group", # lifeExp is a column of gapminder
                    hover_name="Country",
                    hover_data=["ISO","Casualities"],
                    title="Casualities of Terrorist attacks between 1991 and 2017",# column to add to hover information
                    color_continuous_scale=px.colors.sequential.Reds)
fig.show()
In [35]:
bins = [0, 1000, 5000, 10000, 30000, 50000, 200000]
labels = ["1","2","3","4","5","6"]
df = capita.copy()
df['GDP_Group'] = pd.cut(df['2017'], bins=bins, labels=labels)
fig = px.choropleth(df, locations="CountryCode",
                    color="GDP_Group", # lifeExp is a column of gapminder
                    hover_name="CountryName",
                    hover_data=["CountryCode","2017"],
                    title="GDP per Capita 2017",
                    scope="world")

fig.show()
In [36]:
#bins = [0, 1000, 5000, 10000, 30000, 50000, 200000]
#labels = ["1","2","3","4","5","6"]
df = unemp.copy()
#df['GDP_Group'] = pd.cut(df['2018'], bins=bins, labels=labels)
fig = px.choropleth(df, locations="CountryCode",
                    color="2017", # lifeExp is a column of gapminder
                    hover_name="CountryName",
                    hover_data=["CountryCode","2017"],
                    title="Unemployment in 2017",
                    scope="world",
                    color_continuous_scale=px.colors.sequential.Reds)

fig.show()
In [37]:
#bins = [0, 1000, 5000, 10000, 30000, 50000, 200000]
#labels = ["1","2","3","4","5","6"]
df = life.copy()
#df['GDP_Group'] = pd.cut(df['2018'], bins=bins, labels=labels)
fig = px.choropleth(df, locations="CountryCode",
                    color="2017", # lifeExp is a column of gapminder
                    hover_name="CountryName",
                    hover_data=["CountryCode","2017"],
                    title="Life Expectancy in 2017",
                    scope="world",
                    color_continuous_scale=px.colors.sequential.Viridis)

fig.show()
In [38]:
bins = [0, 25,50, 100, 500, 2000, 10000]
labels = ["1","2","3","4","5","6"]
terrorplot = terrorgroup.copy()
terrorplot["Attack_Group"] = pd.cut(terrorplot["Attacks"], bins=bins, labels=labels)
fig = px.choropleth(terrorplot, locations="ISO",
                    color="Attack_Group", # lifeExp is a column of gapminder
                    hover_name="Country",
                    hover_data=["ISO","Attacks"],
                    scope="europe",
                    title="Terrorist attacks in Europe between 1991 and 2018",# column to add to hover information
                    color_continuous_scale=px.colors.sequential.Reds)
fig.show()
In [39]:
df = fulldata.loc[fulldata.Country=="Germany"]
fig = px.line(df, x="Year", y="Casualities",title="Terror Attacks in Germany by Year 1991-2017")
fig.show()
In [40]:
df = pd.DataFrame()
df["Year"]=terrorattacks.keys()[2:29]
df["Attacks"]=terrorattacks[terrorattacks.keys()[2:29]].sum().values
fig = px.line(df, x="Year", y="Attacks",title="Terror Attacks by Year 1991-2017")
fig.show()
In [41]:
df = pd.DataFrame()
df["Year"]=terrorkilled.keys()[3:29]
df["Attacks"]=terrorkilled[terrorkilled.keys()[3:29]].sum().values
fig = px.line(df, x="Year", y="Attacks",title="Terror Casualities by Year 1992-2017")
fig.show()
In [42]:
df = pd.DataFrame()
df["Year"]=gdp_terror.keys()[3:29]
df["GDP"]=gdp_terror[gdp_terror.keys()[3:29]].sum().values
fig = px.line(df, x="Year", y="GDP",title="Evolution of GDP worldwide")
fig.show()
In [43]:
df = pd.DataFrame()
df["Year"]=capita_terror.keys()[2:29]
df["GDP"]=gdp_terror[capita_terror.keys()[2:29]].mean().values
df["Attacks"]=terrorattacks[terrorattacks.keys()[2:29]].sum().values
fig = px.scatter(df, x="GDP", y="Attacks",title="GDP Worldwide vs Terrorist Attacks")
fig.show()
In [44]:
df = pd.DataFrame()
df["Year"]=life_terror.keys()[2:29]
df["GDP"]=life_terror[life_terror.keys()[2:29]].mean().values
df["Attacks"]=terrorattacks[terrorattacks.keys()[2:29]].sum().values
fig = px.scatter(df, x="GDP", y="Attacks",title="Life Expectancy Worldwide vs Terrorist Attacks")
fig.show()
In [45]:
#df = pd.DataFrame()
#df["Year"]=gdp_terror.keys()[2:29]
#df["Value"]=gdp_terror[capita_terror.keys()[2:29]].mean().values/gdp_terror[capita_terror.keys()[2:29]].mean().max()*100
#df["Type"]="GDP"
#df1 = pd.DataFrame()
#df1["Year"]=capita_terror.keys()[2:29]
#df1["Value"]=capita_terror[capita_terror.keys()[2:29]].mean().values/capita_terror[capita_terror.keys()[2:29]].mean().max()*100
#df1["Type"]="GDP per capita"
#df2 = pd.DataFrame()
#df2["Year"]=terrorattacks.keys()[2:29]
#df2["Value"]=terrorattacks[capita_terror.keys()[2:29]].mean().values/terrorattacks[capita_terror.keys()[2:29]].mean().max()*100
#df2["Type"]="Terror Attacks"
#df3=pd.DataFrame()
#df3["Year"]=unemp_terror.keys()[2:29]
#df3["Value"]=unemp_terror[capita_terror.keys()[2:29]].mean().values
#df3["Type"]="Unemployment"

#df4=pd.DataFrame()
#df4["Year"]=life_terror.keys()[2:29]
#df4["Value"]=life_terror[capita_terror.keys()[2:29]].mean().values
#df4["Type"]="Life Expectancy"

#df = df.append(df1,ignore_index = True)
#df = df.append(df2,ignore_index = True)
#df = df.append(df3,ignore_index = True)
#df = df.append(df4,ignore_index = True)

#fig = px.line(df, x="Year", y="Value",color="Type",title="GDP worldwide vs. GPD per capita vs. Terrorist Attacks")
#fig.show()
In [ ]: